Automatic Identification of Discourse Markers in Multiparty Dialogues
نویسندگان
چکیده
The lexical items that can serve as discourse markers (DMs) are often multi-functional. Like and well, in particular, play numerous other roles apart from DMs: for instance, the first one can also be a verb and the second one an adverb. The goal of the present study is the identification, on transcripts of multi-party dialogues, of the occurrences of like and well that play a discourse or pragmatic role. DM identification is a binary classification task over the set of all occurrences of tokens like and well. The importance of DMs to computational linguistics is first discussed, along with previous experiments in DM identification. Then, the data is briefly described, emphasizing the DM annotation procedure and an inter-annotator agreement study. The proposed method uses lexical, prosodic/positional and sociolinguistic features, together with machine learning algorithms, among which decision trees are preferred. The results obtained using a ten-fold cross-validation procedure are analysed at length, focussing first on overall performance, and then on the relevance of each type of features. Feature analysis using a range of techniques shows that lexical indicators are the most reliable features for DM identification, followed by prosodic/positional features. Sociolinguistic features are slightly correlated with the use of like as DM, while the dialogue act of the utterance containing a DM candidate does not seem relevant to DM identification. A differentiated treatment for each token appears to improve performance in almost all experiments. The methods and features used here improve performance over the past experiments, and suggest that DM identification is a tractable problem provided enough training data is available for each DM type, and that lexical features are used for each type.
منابع مشابه
Contrasting the Automatic Identification of Two Discourse Markers in Multiparty Dialogues
The identification of occurrences of like and well that serve as discourse markers (DMs) is a classification problem which is studied here on a corpus of dialogue transcripts with more than 4,000 occurrences of each item. Decision trees using item-specific lexical, prosodic, positional and sociolinguistic features are trained using the C4.5 method. The results demonstrate improvement over past ...
متن کاملAutomatic identification of discourse markers in dialogues: An in-depth study of like and well
The lexical items like and well can serve as discourse markers (DMs), but can also play numerous other roles, such as verb or adverb. Identifying the occurrences that function as DMs is an important step for language understanding by computers. In this study, automatic classifiers using lexical, prosodic/positional and sociolinguistic features are trained over transcribed dialogues, manually an...
متن کاملDiscourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
This paper describes the STAC resource, a corpus of multi-party chats annotated for discourse structure in the style of SDRT (Asher and Lascarides, 2003; Lascarides and Asher, 2009). The main goal of the STAC project is to study the discourse structure of multi-party dialogues in order to understand the linguistic strategies adopted by interlocutors to achieve their conversational goals, especi...
متن کاملMining the Web for Discourse Markers
This paper proposes a methodology for obtaining sentences containing discourse markers from the World Wide Web. The proposed methodology is particularly suitable for collecting large numbers of discourse marker tokens. It relies on the automatic identification of discourse markers, and we show that this can be done with an accuracy within 9% of that of human performance. We also show that the d...
متن کاملText-based Speaker Identification on Multiparty Dialogues Using Multi-document Convolutional Neural Networks
We propose a convolutional neural network model for text-based speaker identification on multiparty dialogues extracted from the TV show, Friends. While most previous works on this task rely heavily on acoustic features, our approach attempts to identify speakers in dialogues using their speech patterns as captured by transcriptions to the TV show. It has been shown that different individual sp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006